Minimax-optimal semi-supervised regression on unknown manifolds
نویسندگان
چکیده
We consider the problem of semi-supervised regression when the predictor variables are drawn from an unknown manifold. A simple approach to this problem is to first use both the labeled and unlabeled data to estimate the manifold geodesic distance between pairs of points, and then apply a k nearest neighbor regressor based on these distance estimates. We prove that given sufficiently many unlabeled points, this simple method which we dub geodesic kNN regression, achieves the optimal finite-sample minimax bound on the mean squared error, as if the manifold were completely specified. Furthermore, we show how this approach can be efficiently implemented requiring only O(kN logN) operations to estimate the regression function at all N labeled and unlabeled points. We illustrate this approach on two datasets with a manifold structure: indoor localization using WiFi fingerprints and facial pose estimation. For both problems we obtain better results than the popular procedure based on Laplacian eigenvectors.
منابع مشابه
Statistical Analysis of Semi-Supervised Regression
Semi-supervised methods use unlabeled data in addition to labeled data to construct predictors. While existing semi-supervised methods have shown some promising empirical performance, their development has been based largely based on heuristics. In this paper we study semi-supervised learning from the viewpoint of minimax theory. Our first result shows that some common methods based on regulari...
متن کاملMinimax Binary Classifier Aggregation with General Losses
We address the problem of aggregating an ensemble of predictors with known loss bounds in a semi-supervised binary classification setting, to minimize prediction loss incurred on the unlabeled data. We find the minimax optimal predictions for a very general class of loss functions including all convex and many non-convex losses, extending a recent analysis of the problem for misclassification e...
متن کاملOptimal Binary Classifier Aggregation for General Losses
We address the problem of aggregating an ensemble of predictors with known loss bounds in a semi-supervised binary classification setting, to minimize prediction loss incurred on the unlabeled data. We find the minimax optimal predictions for a very general class of loss functions including all convex and many non-convex losses, extending a recent analysis of the problem for misclassification e...
متن کاملOn Minimax Optimal Offline Policy Evaluation
This paper studies the off-policy evaluation problem, where one aims to estimate the value of a target policy based on a sample of observations collected by another policy. We first consider the multi-armed bandit case, establish a minimax risk lower bound, and analyze the risk of two standard estimators. It is shown, and verified in simulation, that one is minimax optimal up to a constant, whi...
متن کامل